Additional cleaning of datasets needed: - Clean up tale names regex
in atu - Finish ATU sequences
aftSummary stats by ATU chapter:
| chapter | n_types | n_tales | pct_with_tales | tales_per |
|---|---|---|---|---|
| TALES OF MAGIC | 240 | 450 | 16.7 | 11.2 |
| OTHER TALES OF THE SUPERNATURAL | 36 | 72 | 19.4 | 10.3 |
| OTHER ANIMALS AND OBJECTS | 50 | 64 | 14.0 | 9.1 |
| RELIGIOUS TALES | 543 | 294 | 6.3 | 8.6 |
| ANIMAL TALES | 332 | 280 | 11.7 | 7.2 |
| ANECDOTES AND JOKES | 993 | 306 | 4.5 | 6.8 |
| FORMULA TALES | 53 | 52 | 18.9 | 5.2 |
The treemap below shows the nested sets of the ATU into which AFT
texts fall, by chapter, division, and
sub_division.
To do:
Notes/questions from Sándor:
Atu markup segments motif abstracts TMI defines motifs by 1-2 sentences Relate words in both, cooccurrence Relate this cooccurrence matrix to AFT types
Given three resources, each providing a fragment of the problem. Is this enough for a solution? TMI: a list of motif names, but not definitions. ATU: a list of motif strings aka tale types, built from TMI items. AFT: a selection of tale types as exemplification for the ATU, with frequent enough examples for some of the motif strings in some of the 8 topical genres.
8 genres with frequent enough examples of respective, typical motif strings built from the LEGO kit called the TMI. X samples of text with inherent motif strings for backbones; backbones as motif-based markup; motif “definition”, all three in running text. Motif is a 1-2 sentence summary of some recurrent content element with a function in the plot, relating actors in situations with tools of their resolution. Tale type is an abstract, linking situations in shorthand, from setting through complication to resolution. Can real motifs from typical texts be extracted by means of theoretical strings of theoretical motifs? Is there a way to validate the TMI by automatic means, like in an ML experiment, out of many? Are these three resources enough to reach our goal?
Match between label (TMI) and ill-bounded/demarcated text fragment over an AFT set. Can we find a transformation which converts the set of segments into the label? By means of abstraction/abstracting. Text summarisation in Python and DL available. Reverse problem: how to arrive at text set from label as string. Depends on set size and topic composition, possibly a set of particular mixes.
Given a label and a set of text segments to arrive at that label by DL. Which architecture/method yields the best heuristics? Approximate transformation by back propagation (?). Consult JEK.
Add MFTL. LRRH. Custom-built for experimentation, for researchers with interest in the intersection of data science and folk tale studies. For work in progress.
For every motif in string, correlation between TMI label and ATU segment content vs ATU segment content and AFT segment set, manually marked up.
Convert type sample to robust conceptual equivalent.
Then we could expose this tensor to all kinds of analysis, including DL by CNN (Johan’s favourite), or co-clustering (my bet).
As food for thought, consider this as a working hypothesis: “a motif is a multiple co-occurrence of concept strings anchored in the trilogy”. Whatever the outcome, negative or positive, the hypothesis can be tested, and we could learn if this definition can be falsified.
Plus look at the visuals from co-clustering results for ‘multiple cooccurrence’ as a GS query. Just 150 hits which sounds quite promising for explaining the idea by references from multiple domains, ie methodological cross-pollination.
By concept strings in the TMI I would expect some normalization of word forms to concepts just like eg Propp’s characters, actions/functions, situations etc. There we could perhaps look into ontologies if they exist. Thierry Declerck’s work comes to mind.